Picture for Jinhui Ye

Jinhui Ye

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Add code
May 28, 2026
Viaarxiv icon

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

Add code
Apr 24, 2026
Viaarxiv icon

StarVLA-$α$: Reducing Complexity in Vision-Language-Action Systems

Add code
Apr 13, 2026
Viaarxiv icon

VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models

Add code
Mar 23, 2026
Viaarxiv icon

FutureVLA: Joint Visuomotor Prediction for Vision-Language-Action Model

Add code
Mar 11, 2026
Viaarxiv icon

ST4VLA: Spatially Guided Training for Vision-Language-Action Models

Add code
Feb 10, 2026
Viaarxiv icon

Re-thinking Temporal Search for Long-Form Video Understanding

Add code
Apr 03, 2025
Figure 1 for Re-thinking Temporal Search for Long-Form Video Understanding
Figure 2 for Re-thinking Temporal Search for Long-Form Video Understanding
Figure 3 for Re-thinking Temporal Search for Long-Form Video Understanding
Figure 4 for Re-thinking Temporal Search for Long-Form Video Understanding
Viaarxiv icon

Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding

Add code
Mar 17, 2025
Viaarxiv icon

SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction

Add code
Mar 05, 2025
Figure 1 for SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
Figure 2 for SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
Figure 3 for SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
Figure 4 for SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
Viaarxiv icon

Improving Gloss-free Sign Language Translation by Reducing Representation Density

Add code
May 23, 2024
Figure 1 for Improving Gloss-free Sign Language Translation by Reducing Representation Density
Figure 2 for Improving Gloss-free Sign Language Translation by Reducing Representation Density
Figure 3 for Improving Gloss-free Sign Language Translation by Reducing Representation Density
Figure 4 for Improving Gloss-free Sign Language Translation by Reducing Representation Density
Viaarxiv icon